Accurate diagnosis and prognosis of Alzheimer's disease are crucial to develop new therapies and reduce the associated costs. Recently, with the advances of convolutional neural networks, methods have been proposed to automate these two tasks using structural MRI. However, these methods often suffer from lack of interpretability, generalization, and can be limited in terms of performance. In this paper, we propose a novel deep framework designed to overcome these limitations. Our framework consists of two stages. In the first stage, we propose a deep grading model to extract meaningful features. To enhance the robustness of these features against domain shift, we introduce an innovative collective artificial intelligence strategy for training and evaluating steps. In the second stage, we use a graph convolutional neural network to better capture AD signatures. Our experiments based on 2074 subjects show the competitive performance of our deep framework compared to state-of-the-art methods on different datasets for both AD diagnosis and prognosis.
translated by 谷歌翻译
检测新的多发性硬化症(MS)病变是该疾病进化的重要标志。基于学习的方法的适用性可以有效地自动化此任务。然而,缺乏带有新型病变的注释纵向数据是训练健壮和概括模型的限制因素。在这项工作中,我们描述了一条基于学习的管道,该管道解决了检测和细分新MS病变的挑战性任务。首先,我们建议使用单个时间点对在分割任务进行训练的模型中使用转移学习。因此,我们从更轻松的任务中利用知识,并为此提供更多注释的数据集。其次,我们提出了一种数据综合策略,以使用单个时间点扫描生成新的纵向时间点。通过这种方式,我们将检测模型预算到大型合成注释数据集上。最后,我们使用旨在模拟MRI中数据多样性的数据实践技术。通过这样做,我们增加了可用的小注释纵向数据集的大小。我们的消融研究表明,每个贡献都会提高分割精度。使用拟议的管道,我们获得了MSSEG2 MICCAI挑战中新的MS病变的分割和检测的最佳分数。
translated by 谷歌翻译
阿尔茨海默氏病和额颞痴呆是两种主要痴呆症。它们的准确诊断和分化对于确定特定干预和治疗至关重要。然而,由于临床症状的类似模式,在疾病的早期,这两种痴呆症的鉴别诊断仍然很困难。因此,多种类型痴呆的自动分类具有重要的临床价值。到目前为止,尚未积极探索这一挑战。最近在医学图像领域进行深度学习的发展已经证明了各种分类任务的高性能。在本文中,我们建议利用两种类型的生物标志物:结构分级和结构萎缩。为此,我们首先建议训练大型3D U-NET的合奏,以局部区分健康与痴呆症解剖模式。这些模型的结果是一个可解释的3D分级图,能够指示异常的大脑区域。该地图也可以使用图形卷积神经网络在各种分类任务中被利用。最后,我们建议将深度分级和基于萎缩的分类结合起来,以改善痴呆型识别。与最先进的疾病检测任务和鉴别诊断任务相比,提出的框架表现出竞争性能。
translated by 谷歌翻译
阿尔茨海默氏病的准确诊断和预后对于开发新疗法和降低相关成本至关重要。最近,随着卷积神经网络的进步,已经提出了深度学习方法,以使用结构MRI自动化这两个任务。但是,这些方法通常缺乏解释性和泛化,预后表现有限。在本文中,我们提出了一个旨在克服这些局限性的新型深框架。我们的管道包括两个阶段。在第一阶段,使用125个3D U-NET来估计整个大脑的体voxelwise等级得分。然后将所得的3D地图融合,以构建一个可解释的3D分级图,以指示结构水平的疾病严重程度。结果,临床医生可以使用该地图来检测受疾病影响的大脑结构。在第二阶段,分级图和受试者的年龄用于使用图卷积神经网络进行分类。基于216名受试者的实验结果表明,与在不同数据集上进行AD诊断和预后的最新方法相比,我们的深框架的竞争性能。此外,我们发现,使用大量的U-NET处理不同的重叠大脑区域,可以提高所提出方法的概括能力。
translated by 谷歌翻译
体育视频分析是由于各种应用领域的普遍研究课题,从多媒体智能设备带来了用户量身定制的易消化,以分析运动员的表现。体育视频任务是Mediaeval 2021基准测试的一部分。此任务可以从视频中解决细粒度的动作检测和分类。重点是乒乓球比赛的录音。自2019年以来运行,该任务从未在自然条件下录制的未经监测视频提供了分类挑战,每个行程都有已知的时间边界。今年,数据集延长并提供了未经注释的未经监测视频的检测挑战。这项工作旨在为体育教练和玩家创造工具,以分析体育绩效。在这种技术可以建立运动分析和玩家分析,以丰富运动员的培训经验,提高他们的表现。
translated by 谷歌翻译
This article concerns Bayesian inference using deep linear networks with output dimension one. In the interpolating (zero noise) regime we show that with Gaussian weight priors and MSE negative log-likelihood loss both the predictive posterior and the Bayesian model evidence can be written in closed form in terms of a class of meromorphic special functions called Meijer-G functions. These results are non-asymptotic and hold for any training dataset, network depth, and hidden layer widths, giving exact solutions to Bayesian interpolation using a deep Gaussian process with a Euclidean covariance at each layer. Through novel asymptotic expansions of Meijer-G functions, a rich new picture of the role of depth emerges. Specifically, we find that the posteriors in deep linear networks with data-independent priors are the same as in shallow networks with evidence maximizing data-dependent priors. In this sense, deep linear networks make provably optimal predictions. We also prove that, starting from data-agnostic priors, Bayesian model evidence in wide networks is only maximized at infinite depth. This gives a principled reason to prefer deeper networks (at least in the linear case). Finally, our results show that with data-agnostic priors a novel notion of effective depth given by \[\#\text{hidden layers}\times\frac{\#\text{training data}}{\text{network width}}\] determines the Bayesian posterior in wide linear networks, giving rigorous new scaling laws for generalization error.
translated by 谷歌翻译
Vision-based tactile sensors have gained extensive attention in the robotics community. The sensors are highly expected to be capable of extracting contact information i.e. haptic information during in-hand manipulation. This nature of tactile sensors makes them a perfect match for haptic feedback applications. In this paper, we propose a contact force estimation method using the vision-based tactile sensor DIGIT, and apply it to a position-force teleoperation architecture for force feedback. The force estimation is done by building a depth map for DIGIT gel surface deformation measurement and applying a regression algorithm on estimated depth data and ground truth force data to get the depth-force relationship. The experiment is performed by constructing a grasping force feedback system with a haptic device as a leader robot and a parallel robot gripper as a follower robot, where the DIGIT sensor is attached to the tip of the robot gripper to estimate the contact force. The preliminary results show the capability of using the low-cost vision-based sensor for force feedback applications.
translated by 谷歌翻译
Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning.
translated by 谷歌翻译
This paper presents a class of new fast non-trainable entropy-based confidence estimation methods for automatic speech recognition. We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word for Connectionist Temporal Classification (CTC) and Recurrent Neural Network Transducer (RNN-T) models. Proposed methods have similar computational complexity to the traditional method based on the maximum per-frame probability, but they are more adjustable, have a wider effective threshold range, and better push apart the confidence distributions of correct and incorrect words. We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability at detecting incorrect words for Conformer-CTC and Conformer-RNN-T models, respectively.
translated by 谷歌翻译
Training a neural network requires choosing a suitable learning rate, involving a trade-off between speed and effectiveness of convergence. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ - the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. Using a simple approach to estimate $\eta^{\ast}$, we observe that in constant-width fully-connected ReLU networks, $\eta^{\ast}$ demonstrates different behavior to the maximum learning rate later in training. Specifically, we find that $\eta^{\ast}$ is well predicted as a power of $(\text{depth} \times \text{width})$, provided that (i) the width of the network is sufficiently large compared to the depth, and (ii) the input layer of the network is trained at a relatively small learning rate. We further analyze the relationship between $\eta^{\ast}$ and the sharpness $\lambda_{1}$ of the network at initialization, indicating that they are closely though not inversely related. We formally prove bounds for $\lambda_{1}$ in terms of $(\text{depth} \times \text{width})$ that align with our empirical results.
translated by 谷歌翻译